Serveur d'exploration sur l'OCR

Attention, ce site est en cours de développement !
Attention, site généré par des moyens informatiques à partir de corpus bruts.
Les informations ne sont donc pas validées.

Does the Cost Function Matter in Bayes Decision Rule?

Identifieur interne : 000320 ( Main/Exploration ); précédent : 000319; suivant : 000321

Does the Cost Function Matter in Bayes Decision Rule?

Auteurs : Ralf Schlüter [Allemagne] ; Markus Nussbaum-Thom [Allemagne] ; Hermann Ney [Allemagne]

Source :

RBID : Pascal:12-0214212

Descripteurs français

English descriptors

Abstract

In many tasks in pattern recognition, such as automatic speech recognition (ASR), optical character recognition (OCR), part-of-speech (POS) tagging, and other string recognition tasks, we are faced with a well-known inconsistency: The Bayes decision rule is usually used to minimize string (symbol sequence) error, whereas, in practice, we want to minimize symbol (word, character, tag, etc.) error. When comparing different recognition systems, we do indeed use symbol error rate as an evaluation measure. The topic of this work is to analyze the relation between string (i.e., 0-1) and symbol error (i.e., metric, integer valued) cost functions in the Bayes decision rule, for which fundamental analytic results are derived. Simple conditions are derived for which the Bayes decision rule with integer-valued metric cost function and with 0-1 cost gives the same decisions or leads to classes with limited cost. The corresponding conditions can be tested with complexity linear in the number of classes. The results obtained do not make any assumption w.r.t. the structure of the underlying distributions or the classification problem. Nevertheless, the general analytic results are analyzed via simulations of string recognition problems with Levenshtein (edit) distance cost function. The results support earlier findings that considerable improvements are to be expected when initial error rates are high.


Affiliations:


Links toward previous steps (curation, corpus...)


Le document en format XML

<record>
<TEI>
<teiHeader>
<fileDesc>
<titleStmt>
<title xml:lang="en" level="a">Does the Cost Function Matter in Bayes Decision Rule?</title>
<author>
<name sortKey="Schluter, Ralf" sort="Schluter, Ralf" uniqKey="Schluter R" first="Ralf" last="Schlüter">Ralf Schlüter</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University, Ahornstr. 55</s1>
<s2>Aachen 52074</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District de Cologne</region>
<settlement type="city">Aix-la-Chapelle</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nussbaum Thom, Markus" sort="Nussbaum Thom, Markus" uniqKey="Nussbaum Thom M" first="Markus" last="Nussbaum-Thom">Markus Nussbaum-Thom</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University, Ahornstr. 55</s1>
<s2>Aachen 52074</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District de Cologne</region>
<settlement type="city">Aix-la-Chapelle</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Ney, Hermann" sort="Ney, Hermann" uniqKey="Ney H" first="Hermann" last="Ney">Hermann Ney</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University, Ahornstr. 55</s1>
<s2>Aachen 52074</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District de Cologne</region>
<settlement type="city">Aix-la-Chapelle</settlement>
</placeName>
</affiliation>
</author>
</titleStmt>
<publicationStmt>
<idno type="wicri:source">INIST</idno>
<idno type="inist">12-0214212</idno>
<date when="2012">2012</date>
<idno type="stanalyst">PASCAL 12-0214212 INIST</idno>
<idno type="RBID">Pascal:12-0214212</idno>
<idno type="wicri:Area/PascalFrancis/Corpus">000094</idno>
<idno type="wicri:Area/PascalFrancis/Curation">000678</idno>
<idno type="wicri:Area/PascalFrancis/Checkpoint">000082</idno>
<idno type="wicri:doubleKey">0162-8828:2012:Schluter R:does:the:cost</idno>
<idno type="wicri:Area/Main/Merge">000323</idno>
<idno type="wicri:Area/Main/Curation">000320</idno>
<idno type="wicri:Area/Main/Exploration">000320</idno>
</publicationStmt>
<sourceDesc>
<biblStruct>
<analytic>
<title xml:lang="en" level="a">Does the Cost Function Matter in Bayes Decision Rule?</title>
<author>
<name sortKey="Schluter, Ralf" sort="Schluter, Ralf" uniqKey="Schluter R" first="Ralf" last="Schlüter">Ralf Schlüter</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University, Ahornstr. 55</s1>
<s2>Aachen 52074</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District de Cologne</region>
<settlement type="city">Aix-la-Chapelle</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Nussbaum Thom, Markus" sort="Nussbaum Thom, Markus" uniqKey="Nussbaum Thom M" first="Markus" last="Nussbaum-Thom">Markus Nussbaum-Thom</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University, Ahornstr. 55</s1>
<s2>Aachen 52074</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District de Cologne</region>
<settlement type="city">Aix-la-Chapelle</settlement>
</placeName>
</affiliation>
</author>
<author>
<name sortKey="Ney, Hermann" sort="Ney, Hermann" uniqKey="Ney H" first="Hermann" last="Ney">Hermann Ney</name>
<affiliation wicri:level="3">
<inist:fA14 i1="01">
<s1>Lehrstuhl für Informatik 6, Computer Science Department, RWTH Aachen University, Ahornstr. 55</s1>
<s2>Aachen 52074</s2>
<s3>DEU</s3>
<sZ>1 aut.</sZ>
<sZ>2 aut.</sZ>
<sZ>3 aut.</sZ>
</inist:fA14>
<country>Allemagne</country>
<placeName>
<region type="land" nuts="1">Rhénanie-du-Nord-Westphalie</region>
<region type="district" nuts="2">District de Cologne</region>
<settlement type="city">Aix-la-Chapelle</settlement>
</placeName>
</affiliation>
</author>
</analytic>
<series>
<title level="j" type="main">IEEE transactions on pattern analysis and machine intelligence</title>
<title level="j" type="abbreviated">IEEE trans. pattern anal. mach. intell.</title>
<idno type="ISSN">0162-8828</idno>
<imprint>
<date when="2012">2012</date>
</imprint>
</series>
</biblStruct>
</sourceDesc>
<seriesStmt>
<title level="j" type="main">IEEE transactions on pattern analysis and machine intelligence</title>
<title level="j" type="abbreviated">IEEE trans. pattern anal. mach. intell.</title>
<idno type="ISSN">0162-8828</idno>
</seriesStmt>
</fileDesc>
<profileDesc>
<textClass>
<keywords scheme="KwdEn" xml:lang="en">
<term>Annotation</term>
<term>Automatic recognition</term>
<term>Bayes decision</term>
<term>Character recognition</term>
<term>Character string</term>
<term>Classification</term>
<term>Decision rule</term>
<term>Distance</term>
<term>Grammatical inference</term>
<term>Linear complexity</term>
<term>Linguistics</term>
<term>Loss function</term>
<term>Metric</term>
<term>News</term>
<term>Optical character recognition</term>
<term>Pattern recognition</term>
<term>Speech recognition</term>
<term>Statistical analysis</term>
<term>String matching</term>
<term>Symbol</term>
<term>Symbol error rate</term>
</keywords>
<keywords scheme="Pascal" xml:lang="fr">
<term>Décision Bayes</term>
<term>Règle décision</term>
<term>Reconnaissance forme</term>
<term>Reconnaissance parole</term>
<term>Reconnaissance automatique</term>
<term>Reconnaissance optique caractère</term>
<term>Reconnaissance caractère</term>
<term>Linguistique</term>
<term>Inférence grammaticale</term>
<term>Chaîne caractère</term>
<term>Classification</term>
<term>Annotation</term>
<term>Symbole</term>
<term>Actualités</term>
<term>Métrique</term>
<term>Taux erreur symbole</term>
<term>Complexité linéaire</term>
<term>Distance</term>
<term>Analyse statistique</term>
<term>Fonction perte</term>
<term>.</term>
<term>Appariement chaîne</term>
</keywords>
<keywords scheme="Wicri" type="topic" xml:lang="fr">
<term>Linguistique</term>
<term>Classification</term>
</keywords>
</textClass>
</profileDesc>
</teiHeader>
<front>
<div type="abstract" xml:lang="en">In many tasks in pattern recognition, such as automatic speech recognition (ASR), optical character recognition (OCR), part-of-speech (POS) tagging, and other string recognition tasks, we are faced with a well-known inconsistency: The Bayes decision rule is usually used to minimize string (symbol sequence) error, whereas, in practice, we want to minimize symbol (word, character, tag, etc.) error. When comparing different recognition systems, we do indeed use symbol error rate as an evaluation measure. The topic of this work is to analyze the relation between string (i.e., 0-1) and symbol error (i.e., metric, integer valued) cost functions in the Bayes decision rule, for which fundamental analytic results are derived. Simple conditions are derived for which the Bayes decision rule with integer-valued metric cost function and with 0-1 cost gives the same decisions or leads to classes with limited cost. The corresponding conditions can be tested with complexity linear in the number of classes. The results obtained do not make any assumption w.r.t. the structure of the underlying distributions or the classification problem. Nevertheless, the general analytic results are analyzed via simulations of string recognition problems with Levenshtein (edit) distance cost function. The results support earlier findings that considerable improvements are to be expected when initial error rates are high.</div>
</front>
</TEI>
<affiliations>
<list>
<country>
<li>Allemagne</li>
</country>
<region>
<li>District de Cologne</li>
<li>Rhénanie-du-Nord-Westphalie</li>
</region>
<settlement>
<li>Aix-la-Chapelle</li>
</settlement>
</list>
<tree>
<country name="Allemagne">
<region name="Rhénanie-du-Nord-Westphalie">
<name sortKey="Schluter, Ralf" sort="Schluter, Ralf" uniqKey="Schluter R" first="Ralf" last="Schlüter">Ralf Schlüter</name>
</region>
<name sortKey="Ney, Hermann" sort="Ney, Hermann" uniqKey="Ney H" first="Hermann" last="Ney">Hermann Ney</name>
<name sortKey="Nussbaum Thom, Markus" sort="Nussbaum Thom, Markus" uniqKey="Nussbaum Thom M" first="Markus" last="Nussbaum-Thom">Markus Nussbaum-Thom</name>
</country>
</tree>
</affiliations>
</record>

Pour manipuler ce document sous Unix (Dilib)

EXPLOR_STEP=$WICRI_ROOT/Ticri/CIDE/explor/OcrV1/Data/Main/Exploration
HfdSelect -h $EXPLOR_STEP/biblio.hfd -nk 000320 | SxmlIndent | more

Ou

HfdSelect -h $EXPLOR_AREA/Data/Main/Exploration/biblio.hfd -nk 000320 | SxmlIndent | more

Pour mettre un lien sur cette page dans le réseau Wicri

{{Explor lien
   |wiki=    Ticri/CIDE
   |area=    OcrV1
   |flux=    Main
   |étape=   Exploration
   |type=    RBID
   |clé=     Pascal:12-0214212
   |texte=   Does the Cost Function Matter in Bayes Decision Rule?
}}

Wicri

This area was generated with Dilib version V0.6.32.
Data generation: Sat Nov 11 16:53:45 2017. Site generation: Mon Mar 11 23:15:16 2024